Orphanhood in Colombia

Mortality and Fertility rates

The mortality and fertility rates are computed based on the populations estimates from 1998 to 2021. Such population estimates are based on the census of 2005 and 2018, such that the remaining years were obtained by linearly interpolating (and extrapolating) these two data sources at the desired resolution.

However, specially for fine resolutions (e.g., at municipality level), the population estimates might be inaccurate (so are the mortality and fertility rates). This happens more often for estimates before 2005. Notably, the following problems may occur

  1. Negative population estimate (from the linear interpolation method).
  2. Impossible population estimates with respect to the number of deaths (e.g., the “number of deaths” estimates are larger than the population estimates for some strata).
  3. Impossible population estimates with respect to the number of births (e.g., the number of births is non-zero while the population is zero).
  4. Unlikely population estimates with respect to the number of deaths.
  5. Unlikely population estimates with respect to the number of births (e.g., the number of births are 5+ times the number population size for some strata).

As solutions for these issues, we are doing the following

  1. Replace the negative population estimates with later year estimates (if the estimates are always negative, set the population size to 0).
  2. Treat the impossible values as missing data (i.e., NA), and use some imputation technique to deal with these cases. In particular, we can use the mean (or median) of year + 1 and year - 1.
  3. Same as in 2..
  4. Set the population size such that the mortality rate is the (lower or upper) limit not to be considered an outlier. To determine the threshold defining an outlier, we analyze the variation of the corresponding time series (say, \(\text{mean} \pm 3\times\text{sd}\)) over a pre-defined time period; in particular, we considered the interpolated (not extrapolated) interval—i.e., 2005-2018.
  5. Same as in 4.

After post-processing the data this way, we may still spot some outliers for specific municipalities and strata (these outliers are defined based on the averaged time-series for the mortality or fertility rates). To overcome this problem, we (once again) identify these values and replace them with NA—as before, \(x_i\) is an outlier if it does not fall within \(\text{mean}(\mathbf{x}) \pm 3\times\text{sd}(\mathbf{x})\). The imputation procedure is based on the first-order neighbors; i.e., we replace it by the mean (or median) of the neighbors’ rates.

Lastly, as the corrections were made independently for the mortality and fertility rates, the estimated population in these two groups may not be the same for all combinations of municipality, gender, and age group. To correct this, we simply average the corresponding population estimates and re-compute the rates accordingly.

These are the final total population estimates. As a remark, the aforementioned correction process was made only for the age groups 10+; i.e., the population estimates for individuals 0-9 was kept as original (as we are not using them when estimating the mortality and fertility rates).


When aggregating the data over the municipalities, the yearly Total population estimates are as follows.

Correction details

Given the original population data set—after interpolating (and extrapolating) the unobserved years, we applied the aforementioned corrections in

  1. Negative population estimates: \(1,547\) rows (out of \(511,632\), i.e., \(\approx 0.30\%\)).


  1. Impossible population estimates (mortality): \(27\) rows (out of \(511,632\), i.e., \(\approx 0.01\%\)).


  1. Impossible population estimates (fertility): \(138\) rows (out of \(484,704\), i.e., \(\approx 0.03\%\)).


  1. Unlikely population estimates (mortality): \(16,673\) rows (out of \(511,632\), i.e., \(\approx 3.26\%\)). More specifically, \(11,410\) rows in 1998-2005 (out of \(170,544\), i.e., \(\approx 6.69\%\)), and \(5,263\) rows in 2006-2021 (out of \(341,088\), i.e., \(\approx 1.54\%\)).


  1. Unlikely population estimates (fertility): \(19,927\) rows (out of \(484,704\), i.e., \(\approx 4.11\%\)). More specifically, \(17,248\) rows in 1998-2005 (out of \(161,568\), i.e., \(\approx 10.68\%\)), and \(2,679\) rows in 2006-2021 (out of \(323,136\), i.e., \(\approx 0.83\%\)).


Extra (detection and correction of spatial outliers). See procedure described in the previous section.

  1. Spatial outliers (mortality): \(2,330\) rows (out of \(511,632\), i.e., \(\approx 0.46\%\)).


  1. Spatial outliers (fertility): \(2,126\) rows (out of \(484,704\), i.e., \(\approx 0.44\%\)).


Final step: average population based on the mortality and fertility rates.

  1. Make population based mortality rates and fertility rates equivalent: \(35,748\) rows (mortality rates: out of \(511,632\), i.e., \(\approx 6.99\%\), and fertility rates: out of \(484,704\), i.e., \(\approx 7.38\%\)).



FINAL REMARK: The total number of rows is, at most, \(511,632\); i.e., \(24\) years, \(1,122\) municipalities, and (\(9\) age groups for women and \(10\) age groups for men). When dealing with fertility data, there are \(8\) age groups for women and \(10\) age groups for men.


Results

Now, we will analyse the the mortality rates and fertility rates after processing the population data (as per the above procedure).


Figures below show the time series for the mortality and fertility counts (and rates) of female and male individuals in the 25-29 age group in 40 randomly selected municipalities.

Mortality Female

Mortality Male

Fertility Female

Fertility Male


Next, we show the estimated mortality and fertility rates (mean and standard deviation) of female and male individuals in all age groups and municipalities.

Mortality Female

Mortality Male

Fertility Female

Fertility Male